Common and novel haplotype structures between different types of cancer

Abstract Background Background: Genome‐wide association studies (GWAS) have identified hundreds of genetic variants associated with cancer risk. GWAS data are important for cancer prevention and understanding the underlying mechanisms of cancer. Aims This study aimed to investigate the genetic association between different types of cancer using GWAS data and a bioinformatics approach. Methods and results The significant GWAS variants associated with more than one cancer type were identified. Common linkage disequilibrium (LD) variants between different types of cancer were identified by 1000 genomes phase 3 LD data. Haplotype blocks were identified by analyzing 1000 Genomes phase 3 genotyping data in the GWAS populations. Subsequent analyses included functional SNP analyses and TCGA gene expression. The results associated with significant GWAS variants (P<5E‐8) showed the following haplotype associations in European population: GT rs4808075‐rs8170 haplotype on BABAM1 with breast and ovarian cancers, GC rs16857609‐rs11693806 haplotype on DIRC3 with breast and thyroid cancers, GCG rs380286‐rs401681‐rs31487 haplotype on CLPTM1L with skin and lung cancers, GGG rs4430796‐rs11651052‐rs11263763 haplotype on HNF1B with prostate and endometrial cancers, and GT rs10505477‐rs6983267 haplotype on CASC8 associated with colorectal and prostate cancers. All these genes had significantly different expressions in tumor tissues (P<1E‐3). In addition, the rs11693806 variant is located in the hsa‐miR‐873‐5p binding site and has an enhancing effect on the hsa‐miR‐873‐5p:DIRC3 interaction. Conclusion These novel haplotype structures and miRNA:lncRNA interactions are important for understanding the common genetic link between cancers. These results can potentially be used in genetic panels.

Based on cancer statistics, 2022 cancer deaths and overall incidence have been averted since 1991, but it is still the second leading world cause of death in the world after heart disease. 1 The decrease in cancer mortality is associated with reduced smoking, cancer risk factors, and progress in screening tests, diagnosis, and treatment. 1,2However, the incidence of cancer worldwide is expected to increase in the coming decades because of the influence of demographic factors such as population growth and aging, and is predicted to double by 2020-2070. 3There are more than 200 types of cancer and many can be prevented or treated effectively if diagnosed early. 4After the outbreak of corona disease, the diagnosis and treatment of cancer were affected. 1 a multifactorial disease, different risk factors such as environmental factors, age, obesity, or genetics are effective for cancer.Some cancer incidence is related to inherited genetic factors.On the other hand, genetic changes related to cancer occur by mistake and based on various factors such as environmental factors.6][7] In recent years, genome-wide association studies (GWAS) have investigated the role of genetic variants (genotype) on the risk of cancer (phenotype) and have found many variants and genes associated with cancer or interactions with drug treatment. 8,9GWAS data can be used to identify clinical risk factors, underlying mechanisms, develop drug targets, and predict complex traits such as cancer. 10,11However, the biological functions of many GWAS variants and their mechanisms of action remain unclear.
In this study, for the first time, we aimed to investigate all cancerassociated GWAS variants to find common genetic bases and associations including variants, genes, and haplotypes between different types of cancer using 1000 genome phase 3, TCGA data, and bioinformatics approaches.

| Study pipeline
GWAS significant variants associated with cancers were obtained from all GWAS significant variants.Then GWAS significant variants were determined in more than one type of cancer.The 1000 genome phase 3 LD variants were used to identify common linkage disequilibrium (LD) variants between different types of cancers and candidate haplotypic variants associated with GWAS-significant LD variants.
After that, SNP functional analysis of SNP and TCGA gene expression were performed on the results.The complete flowchart is shown in Figure 1.

| Common GWAS LD variants
The GWAS catalog (gwas_catalog_v1.0.2_associations_e0_r) was downloaded from GWAS catalog-EMBL-EBI (https://www.ebi.ac.uk/ gwas/) to find common GWAS variants.Significant GWAS variants associated with cancer types were identified using the names of cancers in the disease/trait section.These names are presented in Data S1.After that, significant variants were identified by combining cancer GWAS significant variants (P<5E-8) associated with cancers (n=2562) GWAS significant variants (P<5E-8) in more than one type of cancers (n=110) types based on variant names in the R programming language (version 4.2.1).For this purpose, we created a dataset for GWAS significant variants and GWAS traits in SAV format, and then we used the haven package and wrote a specific script to split data and combine them based on GWAS traits (Data S2).Finally, variants with more than one cancer type were identified by sorting the results.
The Ensembl genome browser 110 (https://asia.ensembl.org/index.html)LD calculator was used for GWAS significant variants to find LD variants with (D 0 and r 2 ≥ .6)for all 1000 genome phase 3 populations.Then, to find haplotypic blocks between these significant GWAS LD variants in specific GWAS super-populations, the 1000-genome phase 3 data containing GWAS LD variants were downloaded from Ensembl Genome Browser 110 (https://asia.ensembl.org/index.html). 12Haplotypic blocks and LD plots were generated using HaploView V4.2.

| Gene expression
The genes with haplotypic blocks associated with GWAS significant variants were included in gene expression analysis.In gene expression, up or downregulation of all TCGA tumor tissues were compared with normal tissues based on OncoDB (https://oncodb.org/index.html) and TIMER2.0(http://timer.cistrome.org/)Gene_DE module via the Wilcoxon test.The results with p-value <.001 were considered significant. 17,18| RESULTS

| Common GWAS significant LD variants and genes between different types of cancer
The significant GWAS variants associated with cancers were identified from the GWAS catalog.There were 110 GWAS significant variants in more than one type of cancer (Data S3).Twenty-six GWAS significant variants on eight genes were in LD and common between different types of cancers based on the 1000 genome phase 3. Most of them were intronic eQTL variants.The results are shown in Table 1.The DIRC3 is an lncRNA, and the rs11693806 variant was located in the hsa-miR-873-5p:DIRC3 binding site (Table 2).None of the other GWAS significant and LD variants were located in the miRNA binding site (Data S4) or near any motifs associated with the identified genes (Data S5).

| Haplotypic blocks
Further analysis of 1000 genome phase 3 genotyping data reveals candidate haplotypic blocks in five genes.The results are presented in

| Gene expression based on TCGA data
Based on the TCGA data, the expression of five genes with haplotypic structures was significantly different from normal tissues (P<1E-3).
The results for HNF1B, CASC8, BABAM1, and CLPTM1L, are shown in Figure 3.The DIRC3 results are shown in Data S7.

| DISCUSSION
This study investigated the common role of GWAS significant genetic factors in different types of cancer.In this study, for the first time, the association of the 26 LD variants on eight genes with different types of cancer was identified.The rs11693806 variant of DIRC3 lncRNAs with a gain effect on hsa-miR-873-5p:DIRC3 interaction located on hsa-miR-873-5p biding site.[21] Previous studies have investigated some genome-wide haplotype associations, and new methods have explored the chromosome-scale haplotype-resolved reconstruction approach to characterize the cancer precise structural variant landscape. 22,23However, in this study, for the first time, some haplotypic GWAS significant LD variants were identified for at least two types of cancer.The haplotypic structures were identified in DIRC3, BABAM1(C19orf62), HNF1B, CLPTM1L, and CASC8 genes.GT haplotypic structure of rs4808075-rs8170 variants on BABAM1 gene is associated with both breast and ovarian T A B L E 1 GWAS significant LD variants (P<5E-8) in more than one type of cancer.[26][27] Also, rs4808075 is a pleiotropic cancer susceptibility variant associated with five types of cancer, such as breast and ovarian cancer. 24 haplotypic structure of rs16857609-rs11693806 variants on DIRC3 gene is associated with breast and thyroid cancers European population.These two variants are located at a distance of 4350 nucleotides from each other.The results of gene expression showed that DIRC3 gene expression is significantly different from adjacent normal tissues in breast and thyroid cancer.
The role of rs16857609 and rs11693806 variants on breast or thyroid cancer has been investigated separately in previous studies. 19,20,28,29In addition, the association between other DIRC3 variants with breast and thyroid cancer were investigated in a previous study. 30GCG haplotypic structure of rs380286-rs401681-rs31487 variants on CLPTM1L gene is associated with skin and lung cancers in European population.The gene expression results showed that CLPTM1L gene expression is significantly different from adjacent normal tissues in skin and lung cancers.2][33][34] Other studies identified the association of rs401681variant with lung or skin cancers in other populations. 35,36G haplotypic structure of rs4430796-rs11651052-rs11263763 variants with 5 kb distance on HNF1B gene is associated with prostate and endometrial cancers in European population.The gene expression results showed that HNF1B gene expression is significantly different from adjacent normal tissues in endometrial cancer.
8][39][40][41][42][43][44] Furthermore, the GT haplotypic structure of rs10505477-rs6983267 variants with 5 kb distance on the CASC8 gene is associated with colorectal and prostate cancers in European population.The results of gene expression showed that CASC8 gene expression is significantly different from adjacent normal tissues in colorectal and prostate cancers.6][47][48][49][50][51][52][53] Further studies identified the association of rs10505477 with colorectal and prostate cancers in other populations. 54,55 conclusion, this study identified five novel haplotype structures and one miRNA:lncRNA interaction common between more than one type of cancer which is important for understanding the genetic

Figure 2 .
Figure 2. HNF1B GGG haplotype in European population associated with prostate and endometrial cancers, CLPTM1L GCG haplotype in European population associated with skin and lung cancers, CASC8 GT haplotype in European populations associated with colorectal and prostate cancers, DIRC3 GC haplotype in European population associated with breast and thyroid cancers, and BABAM1(C19orf62) GT haplotype in European population associated with breast and ovarian cancers.Other results are shown in Data S6.
population.These two variants are located at a distance of 578 nucleotides from each other.The gene expression results showed that the BABAM1 gene expression is significantly different from adjacent normal tissues in breast and ovarian cancer.